AN OPTIMALITY THEORY FOR MID p–VALUES IN 2 × 2 CONTINGENCY TABLES
نویسندگان
چکیده
The contingency table arises in nearly every application of statistics. However, even the basic problem of testing independence is not totally resolved. More than thirty–five years ago, Lancaster (1961) proposed using the mid p–value for testing independence in a contingency table. The mid p–value is defined as half the conditional probability of the observed statistic plus the conditional probability of more extreme values, given the marginal totals. Recently there seems to be recognition that the mid p–value is quite an attractive procedure. It tends to be less conservative than the p–value derived from Fisher’s exact test. However, the procedure is considered to be somewhat ad–hoc. In this paper we provide theory to justify mid p–values. We apply the Neyman– Pearson fundamental lemma and the estimated truth approach, to derive optimal procedures, named expected p–values. The estimated truth approach views p–values as estimators of the truth function which is one or zero depending on whether the null hypothesis holds or not. A decision theory approach is taken to compare the p–values using risk functions. In the one–sided case, the expected p–value is exactly the mid p–value. For the two–sided case, the expected p–value is a new procedure that can be constructed numerically. In a contingency table of two independent binomial samplings with balanced sample sizes, the expected p–value reduces to a two–sided mid p–value. Further, numerical evidence shows that the expected p–values lead to tests which have type one error very close to the nomial level. Our theory provides strong support for mid p–values.
منابع مشابه
Granularity refined by knowledge: contingency tables and rough sets as tools of discovery
Contingency tables represent data in a granular way and are a well-established tool for inductive generalization of knowledge from data. We show that the basic concepts of rough sets, such as concept approximation, indiscernibility, and reduct can be expressed in the language of contingency tables. We further demonstrate the relevance to rough sets theory of additional probabilistic information...
متن کاملIndependence in Multi-Way Contingency Tables: S. N. Roy’s Breakthroughs and Later Developments
In the mid 1950s S. N. Roy and his students contributed two landmark articles to the contingency table literature (Roy and Kastenbaum 1956, Roy and Mitra 1956). The first article generalized concepts of interaction from 2×2×2 contingency tables to three-way tables of arbitrary size and to larger tables. In the second article, which is the source of our primary focus, various notions of independ...
متن کاملInformation Identities and Testing Hypotheses: Power Analysis for Contingency Tables
An information theoretic approach to the evaluation of 2 x 2 contingency tables is proposed. By investigating the relationship between the Kullback-Leibler divergence and the maximum likelihood estimator, information identities are established for testing hypotheses, in particular, testing independence. These identities not only validate the calibration of p values, but also yield unified power...
متن کاملAll Rational Polytopes Are Transportation Polytopes and All Polytopal Integer Sets Are Contingency Tables
We show that any rational polytope is polynomial-time representable as a “slim” r × c× 3 three-way line-sum transportation polytope. This universality theorem has important consequences for linear and integer programming and for confidential statistical data disclosure. It provides polynomial-time embedding of arbitrary linear programs and integer programs in such slim transportation programs a...
متن کاملGeneralized Measure of Departure from No Three-Factor Interaction Model for 2 x 2 x K Contingency Tables
For 2 × 2 × K contingency tables, Tomizawa considered a Shannon entropy type measure to represent the degree of departure from a log-linear model of no three-factor interaction (the NOTFI model). This paper proposes a generalization of Tomizawa’s measure for 2 × 2 × K tables. The measure proposed is expressed by using Patil-Taillie diversity index or Cressie-Read power-divergence. A special cas...
متن کامل